fix: use native fetch for Ollama embedding to ensure AbortController works#362
fix: use native fetch for Ollama embedding to ensure AbortController works#362jlin53882 wants to merge 2 commits intoCortexReach:masterfrom
Conversation
…works Root cause: OpenAI SDK HTTP client does not reliably abort Ollama TCP connections when AbortController.abort() fires in Node.js. This causes stalled sockets that hang until the gateway-level 120s timeout. Fix: Add isOllamaProvider() to detect localhost:11434 endpoints, and embedWithNativeFetch() using Node.js 18+ native fetch instead of the OpenAI SDK. Native fetch properly closes TCP connections on abort. Added Test 8 (testOllamaAbortWithNativeFetch) to cjk-recursion-regression test suite. Also added standalone test (pr354-standalone.mjs) and 30-iteration stress test (pr354-30iter.mjs). Fixes CortexReach#361.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Hey @jlin53882, the |
CI Failure DiagnosisWhat's failingThe Why it's failingThe test calls The assertion at line 282 expects the error message to match: /ollama embedding failed|404|Failed to generate embedding from Ollama/i.test(errorCaught.message)None of those patterns match The fixIn assert.ok(
/ollama embedding failed|404|Failed to generate embedding from Ollama/i.test(errorCaught.message),
"Error should come from Ollama native fetch path, got: " + errorCaught.message
);to: assert.ok(
/ollama embedding failed|404|Failed to generate embedding from Ollama|Embedding provider unreachable/i.test(errorCaught.message),
"Error should come from Ollama native fetch path, got: " + errorCaught.message
);This accounts for the fact that in CI (no Ollama running), the connection-refused error from native fetch gets wrapped by Note on the standalone test files
|
|
Replaced by PR #383 — rebased, cleaned up test dependencies, AliceLJY assertion fix applied. |
|
Thanks for the detailed diagnosis @AliceLJY! Follow-up has been addressed in a clean rebase at PR #383. The changes:
Please take a look at PR #383 when you have time. Thanks! |
PR Chain & Related IssuesThis PR (#362) is part of a chain of work. Here's the full relationship: Core Fix: Ollama Native Fetch
Discovered During AnalysisWhile investigating CI failures on #383, two additional issues were found and addressed:
Other Related PRs (recent merges)
Recommended Merge Order
@AliceLJY — the cli-smoke failure on #383 is caused by the pre-existing reflection-bypass-hook test failure (#384/#385). Once #385 is merged, please re-trigger CI on #383. Thanks! |
|
Superseded by #383. |
Problem
When using Ollama as the embedding provider, the \embedQuery\ timeout (EMBED_TIMEOUT_MS = 10s) does not reliably abort stalled Ollama HTTP requests. This causes the gateway-level \autoRecallTimeoutMs\ (120s) to fire instead.
Evidence from gateway logs:
\
20:48:46 auto-recall query truncated from 1233 to 1000 chars
[120 seconds of silence]
20:50:46 auto-recall timed out after 120000ms
\\
CPU ~20%, Ollama CPU ~0% — signature of a hanging HTTP connection.
Root Cause
\embedder.ts\ uses the OpenAI SDK to call Ollama. The SDK HTTP client in Node.js does not reliably abort the underlying TCP connection when \AbortController.abort()\ is called. Ollama keeps processing and the socket hangs until the 120s gateway timeout fires.
Fix
Use *native \etch* for Ollama endpoints. Node.js 18+ native \etch\ correctly respects \AbortController\ — TCP connection is properly closed when the signal fires.
Added
Modified
\embedWithRetry()\ now routes Ollama URLs through \embedWithNativeFetch()\ instead of the OpenAI SDK.
Test
30 iterations — all passed ✅
Abort time consistent at ~208–215ms (signal fires at 200ms).
Fixes #361